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1.  Introduction 

Current  electrical  systems  are  faced  with  the  limitation  in  performance  by  the  electrical 
interconnect  technology  detennining  overall  processing  speed.  In  addition,  the  electrical 
interconnects  containing  many  long  distance  interconnects  require  high  power  to  drive.  One  of 
the  best  ways  to  overcome  these  bottlenecks  is  through  the  use  of  optical  interconnect  to  limit 
interconnect  latency  and  power. 

The  2002  Semiconductor  Industry  Association  (SIA)  roadmap  update  shows  the  substantial 
problems  associated  with  electrical  interconnects  on  silicon  chips.  Off-chip  long  distance 
interconnections  suffer  in  performance  [1].  It  is  proposed  to  replace  such  interconnections  with 
optical  interconnect  to  mitigate  specific  interconnect  perfonnance  issues. 

In  2000,  D.  A.  B.  Miller  codified  the  physical  advantages  of  optical  interconnect  over 
electrical  interconnect  [2].  Some  possible  practical  advantages  of  optical  interconnects  are 
described  in  below. 

•  Design  simplification: 

■  No  electromagnetic  wave  phenomena. 

■  No  distance  dependence. 

■  No  frequency  dependence. 

•  Architecture: 

■  Large  numbers  of  long  high-speed  connections. 

■  2D  interconnect  architecture. 

■  No  requirement  of  interconnect  hierarchy. 

•  Timing: 

■  Predictable  signal  timing. 

■  No  timing  skew. 

•  Other  physical  advantages: 

■  Power  savings. 

■  High  interconnect  density. 
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Because  of  the  above  advantages,  optical  interconnects  could  increase  overall  performance  of 
electrical  packages,  and  reduce  the  crosstalk,  power  consumption  and  signal  latency.  Due  to  the 
lack  of  computer-aided  design  (CAD)  tools  in  optics,  it  necessitates  the  development  of  new 
CAD  tools  for  emerging  technologies  such  as  optical  interconnect. 

2.  Overview  of  Existing  Work 

In  this  section,  the  existing  works  about  optical  interconnect  are  presented.  For  a  special  case, 
the  brief  history  of  optical  clock  distribution  is  also  presented. 

2.1.  Optical  Interconnection 

During  1960s,  the  semiconductor  diode  laser  had  been  developed.  This  was  the  starting  point 
that  tried  to  use  optics  in  digital  computation.  As  the  conclusion  of  the  first  demonstration, 
optical  devices  could  not  substitute  for  transistors  in  general  computing  machines  because  they 
consumed  too  much  power  [3],  However,  the  ideas  of  optics  for  communication  were  conceived 
at  that  time.  From  the  mid  1970s  to  the  late  1980s,  the  optical  switching  was  paid  attention 
because  optical  switch  could  be  much  faster  than  any  electrical  transistor  [4],  This  is  still  valid 
because  nonlinear  optics  can  make  logic  devices  much  faster  than  any  electrical  devices.  In  the 
early  1980s,  quantum  well  structures  were  enhanced  and  this  led  to  further  interest  in 
semiconductor  optical  switching  devices. 

In  1984,  J.  W.  Goodman  proposed  ideas  of  optical  interconnection  of  very  large  scale 
integration  (VLSI)  electronics:  intra-chip  data  communications  and  inter-chip  data 
communications.  They  were  the  actual  start  of  the  field  of  optical  interconnects  [5],  The 
quantum-confined  Stark  effect  was  discovered  in  III-V  semiconductor  quantum  wells  in  1984  [6]. 
This  effect  was  important  for  optical  computing  and  optical  interconnects  because  of  the 
allowance  of  low  energy  devices,  the  possibility  of  2D  interconnected  modulator  or  switch  and 
the  capability  of  large  arrays  of  devices.  The  very  important  devices,  vertical  cavity  surface- 
emitting  lasers  (VCSELs),  were  developed  [7]  and  demonstrated  [8]  in  the  late  1980s.  The  first 
demonstration  which  VCSEL  was  electrically  pumped  was  successfully  made  at  room 
temperature.  VCSELs  became  very  interested  practical  devices,  especially  for  low-cost  optical 
fiber  connections.  Moreover,  they  are  candidate  devices  for  optical  interconnects  to  silicon  chips. 


2 


An  optomechanical  configuration  has  been  conceived  [9]  that  is  far  less  complex  than  any 
current  approach  in  1994.  Implementing  a  Fourier-plane-based  interconnect  with  an  arbitrary 
degree  of  space  variance,  it  comprises  only  two  component  aggregates  requiring  mutual 
alignment  in  free  space.  Connections  among  such  free-space  interconnected  modules  (FSIMs) 
are  effected  over  waveguide  ribbons  in  a  natural  fashion  obeying  the  principles  of  hierarchical 
interconnections:  all  signals  leave  chips  on  the  same  physical  transport  medium,  in  this  case 
through  the  optical  array  input  and  output  (I/O)  apertures  to  free-space  modes.  Use  of  a 
mechanically  compliant  medium  (ribbons)  decouples  system  scaling  from  free-space  alignment 
requirements. 

A  prototype  3D  optoelectronic  neural  network  was  implemented  in  1994  [10].  It  was 
composed  of  a  16-node  input,  4-neuron  hidden,  and  a  single-neuron  output  layer.  The  prototype 
used  high-speed  optical  interconnects  for  fan-out  and  mixed-signal  VLSI  circuits  for  fan-in.  In 
1997,  S.  P.  Levitan,  et  al,  developed  “Chatoyant”,  a  mixed-signal  CAD  tool  for  perfonning  end- 
to-end  system  simulations  of  free  space  interconnection  systems  [11].  Chatoyant  was  able  to 
analyze  optical,  electrical,  and  mechanical  trade-offs.  The  prototype  system  for  intra  multi-chip 
module  (MCM)  interconnects  was  built  in  1999  [12].  This  system  supported  48  independent 
free-space  optical  interconnect  (FSOI)  channels  using  8  lasers  and  detectors.  All  chips  were 
integrated  on  a  ceramic  substrate  with  three  silicon  chips. 

M.  Forbes,  et  al,  presented  three  different  types  of  approaches  for  optoelectronic 
interconnects  between  VLSI  chips  [13]:  fibre-ribbons,  planar  waveguides  and  free-space  optics. 
This  paper  pointed  out  the  limitations  of  electrical  interconnect  and  the  advantages  of  optical 
interconnect. 

Optical  connections  between  individual  computer  systems  are  now  available.  N.  Savage 
anticipated  that  optical  interconnection  would  be  introduced  in  the  computer  to  connect  circuit 
boards  within  2-5  years  [14]  and  connect  chips  within  5-10  years.  Optical  interconnects  will  be 
feasible  in  15  years  for  on-chip  interconnects. 

2.2.  Optical  Clock  Distribution 

Claude  Chappe  invented  optical  telegraph  in  1790s  and  it  is  the  starting  point  of  optical 
communication  systems.  In  1870,  John  Tyndall  demonstrated  that  light  was  guided  in  a  water  jet. 
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However,  the  idea  of  a  communication  system  based  on  the  propagation  of  light  through  circular 
dielectric  waveguides  was  considered  from  the  mid-1960s,  albeit  some  theoretical  studies  were 
performed  in  the  early  years  of  the  present  century  [15,  16]. 

In  1984,  J.  W.  Goodman  suggested  three  optical  clock  distribution  approaches  [5]:  index- 
guided,  unfocused  free-space  and  focused  free-space  optical  interconnect.  In  index-guided 
optical  interconnect,  two  types  of  waveguides,  optical  fibers  and  optical  waveguides  integrated 
on  a  suitable  substrate,  can  be  used.  The  two  optical  interconnect  technologies  provide  a  compact 
and  planar  packaging  of  the  global  optical  clock  distribution  without  diffractive  components.  In 
unfocused  free-space  optical  interconnect,  the  optical  signals  carrying  the  clock  signals  broadcast 
to  the  entire  electronic  chip.  Because  detectors  are  located  in  the  same  distance  at  the  focal  point 
of  the  lens,  there  is  no  clock  skew.  A  focused  free-space  optical  clock  distribution  uses  a 
holographic  optical  element.  The  holographic  optical  element  acts  as  a  complex  grating. 

In  1988,  B.  D.  Clymer  and  J.  W.  Goodman  presented  the  skew  properties  of  an  array  of 
optical  transimpedence  receivers  associated  with  a  hologram-based  focused  free-space  optical 
clock  distribution  [17].  The  test  circuit  used  3  pm  Metal  Oxide  Semiconductor  Implementation 
System  (MOSIS)  technology  with  18  optical  receivers. 

P.  J.  Delfyett,  et  al,  introduced  the  mode-locked  operation  of  a  semiconductor  laser  system  as 
a  jitterless  timing  source  [18].  They  demonstrated  the  optical  clock  distribution  of  1024  separate 
ports  utilizing  optical  fibers.  The  total  accumulated  timing  jitter  was  less  than  12ps. 

A  board-level  free-space  optical  clock  distribution  system  implemented  with  substrate  mode 
hologram  was  presented  by  J.  H.  Yeh,  et  al  in  1995  [19].  The  system  used  an  H-tree  clock 
distribution  to  avoid  clock  skews.  With  622MHz  clock  signal,  36ps  of  timing  jitter  and  less  than 
lOps  of  clock  skew  were  achieved. 

In  1998,  Y.  Li,  et  al,  reported  board-level  large  bandwidth  optical  clock  distributions  with 
fanout  of  128  on  a  printed  circuit  board  using  silica  and  polymer  optical  fibers  [20].  The  result 
showed  the  multi  Gbps  bandwidth  capability. 

A  multi-GHz  optical  clock  distribution  on  a  Cray  T-90  supercomputer  multi-processor  board 
is  presented  in  1999  [21].  The  optical  clock  signal  is  distributed  to  48  fanout  points  on 
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1 4.5x27cm  printed  wiring  board  through  a  polyimide  optical  waveguide  organized  as  an  H-tree 
structure. 

Optical  clock  distribution  technology  eliminates  the  disadvantages  of  electrical  clock 
distribution  such  as  clock  skew,  timing  jitter,  etc.  Moreover,  it  allows  no  limitations  on  the 
maximum  frequency  of  modulation  of  an  optical  signal. 

3.  The  CAD  Tools  for  Optoelectronic  Systems 

3.1.  GOETHE  (Generic  Opto-Electronic  system  design  THEurgist) 

A  CAD  tool,  GOETHE  (Generic  Opto-Electronic  system  design  THEurgist),  was  developed 
for  optimizing  System-on-Chip  (SoC)  placement  and  routing  of  electrical  and  optical 
interconnects  simultaneously  utilizing  free-space  optical  interconnect  technology.  GOETHE 
determines  which  of  the  interconnects  are  routed  electrically  and  which  are  routed  optically 
without  exceeding  the  routing  capacity  of  the  optical  interconnect  while  minimizing  total 
electrical  interconnect  length.  Free-space  optical  interconnect  technology  is  suitable  for  routing 
on-chip  interconnects  using  an  optical  interconnect  layer  [5],  Data  throughput  between  modules 
could  be  enhanced  through  the  use  of  free-space  optical  interconnect  by  a  factor  of  a  thousand 
[14]. 

This  research  discusses  the  design  of  the  circuit  on  silicon  substrate  and  its  interaction  with 
the  optical  substrate  in  Figure  1 . 
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Beam  arriving  from  Beam  directed  toward 
Fourier  plane  Fourier  plane 

Multimode  waveguide 


^  VCSEL  |—  Detector 


Figure  1.  Optomechanical  configuration  in  the  neighborhood  of  the  substrate 

The  integration  configuration  for  optoelectronic  substrate  modules  is  shown  in  Figure  2.  On 
the  silicon  substrate,  VCSEL  and  photodetector  array  is  bonded  based  upon  flip-chip  technology. 
On  the  optical  substrate,  there  is  a  microoptical  substrate  which  carries  focal-plane  diffractive 
elements. 


Figure  2.  Integration  configuration  for  optoelectronic  substrate  modules 
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3.1.1.  Assumption 

All  module  shapes  are  assumed  to  be  rectangular-shaped.  Pins  are  assigned  into  module 
periphery.  A  set  of  netlists  are  generated  randomly  for  a  specified  number  of  modules.  It  is  also 
assumed  that  the  SoC  operations  are  pipelined  and  that  all  module  data  transfers  are  buffered. 

In  this  research,  three  arrangements  of  optical  sensors  are  considered  (see  Figure  3).  The  gray 
circles  represent  emitters  and  the  white  circles  represent  detectors. 
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(a) 

(b) 

(c) 

Figure  3.  Three  different  sensor 

arrangements 

In  the  horizontal  and  vertical  directions,  the  patterns  of  Figure  3  are  repeated  up  to  the  size  of 
a  SoC.  However,  any  regular  sensor  arrangements  of  emitter-detector  configurations  can  be 
specified  as  an  input  to  the  CAD  tool.  Therefore,  it  is  possible  to  experiment  with  different 
sensor  arrangements  of  emitter-detector  configurations  so  that  detennines  which  gives  the  best 
performance. 

3.1.2.  Optimization  Algorithms 

The  optimization  goals  are  as  follows: 

•  Given :  The  preliminary  locations  of  all  modules  and  netlists  and  the  arrangements  of 
optical  sensors  in  the  VCSEL  array. 

•  Determine :  The  optimal  placement  of  all  modules. 

•  Such  that  (optimization  criteria ):  (a)  total  electrical  interconnect  length  is  minimized 
and  (b)  the  utilization  of  the  optical  routing  capacity  is  maximized. 

The  optimization  algorithm  consists  of  a  placement  and  routing  and  a  module  compaction 
step.  These  are  described  in  Section  3.2.1  and  3.2.3. 
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3. 1.2.1.  Placement  and  Routing 

Genetic  Algorithm  is  employed  in  order  to  optimize  placement  of  modules  and  routing  of 
electrical  and  optical  interconnects  simultaneously.  There  are  three  steps  to  find  the  best 
placement  of  modules  which  gives  minimum  routing  cost. 

First  of  all,  population  is  generated  as  a  group  of  many  random  orders  of  modules.  The 
number  of  populations  is  one  of  the  inputs  to  the  CAD  tool.  These  orders  are  stored  as  a 
sequence  of  numbers.  Second,  two  better  groups  which  are  called  parents  in  the  population  and 
combine  them  to  create  two  new  solutions  which  are  called  children  using  Crossover.  During 
crossover,  a  random  point  is  picked  in  the  parents’  sequences  and  switched  every  number  in  the 
sequence  after  that  point.  When  the  placement  of  modules  is  changed,  the  sequence  of  the 
longest  interconnects  is  also  changed. 

However,  the  crossover  sometimes  may  not  work  because  the  population  is  represented  by  a 
sequence  of  numbers.  An  example  is  shown  below  [22]. 


Parent  1 

123456789 

Parent  2 

876325491 

Child  1 

123455491 

Child  2 

876326789 

Figure  4.  An  example  that  crossover  operation  does  not  work 


To  resolve  this  phenomenon,  partially  matched  crossover  is  employed  which  is  shown  in 
Figure  5. 


Parent  1 

123456789 

Parent  2 

876325491 

Child  1 

132546987 

Child  2 

876234591 

Figure  5.  Partially  matched  crossover 
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Finally,  modules  are  rotated  to  randomly  chosen  direction  during  Mutation. 

The  above  operations  may  not  reproduce  good  parents  to  better  children.  Therefore,  if  the 
children  are  not  better  than  their  parents,  the  children  should  be  then  discarded  in  the  population. 
This  decision  is  made  by  total  routing  cost  (see  Figure  9). 

The  algorithm  attempts  to  replace  long  electrical  interconnects  with  optical  interconnects  in 
the  sequence  of  interconnect  length.  However,  the  breakpoint  which  electrical  interconnect  can 
be  replaced  with  optical  interconnect  should  be  detennined  because  electrical  interconnects  is 
still  dominant  over  optical  interconnects  for  very  short  distance  interconnect.  It  is  described  in 
Section  3. 1.2.5  and  3. 1.2. 6. 

The  following  operations  are  the  overall  goals  of  the  optimization. 

•  Maximization  of  the  utilization  of  the  optical  routing  capacity. 

•  Minimization  of  the  length  of  the  critical  (longest)  electrical  interconnects. 

•  Minimization  of  the  total  electrical  interconnect  length. 

If  an  input  port  of  one  module  is  to  be  routed  to  an  output  port  of  another  module  optically, 
then  the  input  port  and  the  output  port  must  first  be  routed  electrically  to  the  nearest  detector  and 
emitter  respectively.  The  cost  of  routings  electrically  from  I/O  ports  to  the  sensors  is  included  in 
the  total  cost  of  the  electrical  interconnect  routing. 

Figure  6  shows  the  CAD  tool  layout  after  the  optimization  with  16  modules. 


Figure  6.  The  CAD  tool  layout  with  optimization 
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3. 1.2.2.  Optical  Routing  Capacity 

The  physical  length  of  optical  interconnect  does  not  matter.  This  is  different  from  electrical 
interconnects.  The  optical  routing  capacity  is  determined  by  the  number  of  optical  directions  in 
which  signals  have  to  be  routed  [9].  It  turns  out  that  due  to  the  manner  in  which  the  diffraction 
grating  of  Figure  2  is  fabricated,  the  optical  routing  capacity  depends  upon  the  number  of  optical 
directions  rather  than  upon  the  number  of  physical  routings  of  optical  interconnects.  Thereby,  the 
number  of  optical  directions  that  the  optical  substrate  can  support  is  one  of  the  inputs  to  the 
optimization  tool. 

Figure  7(a)  shows  the  physical  routing  of  signals  in  the  optical  substrate.  The  gray  circles 
represent  emitters  and  the  white  circles  represent  detectors.  Figure  7(b)  shows  optical  vectors 
that  the  routing  configuration  of  Figure  7(a)  reduces  to.  All  parallel  optical  directions  in  Figure 
7(a)  trim  down  to  a  single  optical  vector  in  Figure  7(b). 


The  regions  between  modules  are  called  channels.  At  the  beginning  of  the  optimization 
process,  the  CAD  tool  places  modules  with  distance  of  wiring  capacity.  Then,  virtual  vertical 
direction  lines  are  placed  in  the  channels.  The  number  of  horizontal  electrical  interconnects 
crossing  the  vertical  direction  line  for  each  channel  is  calculated.  From  this  calculation,  a 
difference  between  the  wiring  capacity  and  the  wiring  density,  which  we  call  Ridge  is  formed. 
The  Compression-Ridge  method  is  applied  to  delete  the  Ridge  region  [23].  This  method  is 
consecutively  introduced  in  the  horizontal  direction. 
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■  □  □ 


(a)  (b) 

Figure  8.  Compression-Ridge  Method 


3. 1.2.4.  Cost  Function 


The  cost  function  for  the  optimization  is  described  in  Figure  9.  It  is  composed  of  the 
electrical  interconnect  cost  and  the  optical  interconnect  cost.  The  first  term  includes  the  electrical 
interconnect  length  due  to  all  the  electrical  interconnect  plus  the  interconnect  length 
contributions  due  to  the  electrical  interconnects  to  all  emitters  and  detectors  which  are  occupied 
by  optical  interconnects.  Manhattan  distance  is  employed  to  calculate  the  electrical  interconnect 
length. 


f  v  A 

Cost  -  J':  ( Electrical  interconnect  length )  +  w  — 

\D) 

w  =  Weight  factor  for  optical  cost 

where  • 

v  =  The  number  of  optical  vectors  in  current  layout 

D  =  The  maximum  number  of  optical  directions 
that  the  optical  substrate  can  accommodate 

Figure  9.  The  cost  function 

The  second  term  represents  the  optical  interconnect  cost.  It  attempts  to  maximize  utilization 
of  the  optical  routing  capacity  without  exceeding  it  -  note  that  at  each  step  of  the  algorithm  the 
longest  interconnects  are  replaced  with  optical  interconnect,  w  is  the  weight  factor  for  the  optical 
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cost  and  is  set  to  be  100  in  this  research.  It  turns  out  that  the  optical  cost  is  the  percentage  of  the 
utilization  of  the  optical  routing  capacity. 

3. 1.2.5.  Speed  Regression  Model 

In  1998,  G.  I.  Yayla  presented  comparison  of  electrical  interconnects  and  optical 
interconnects  from  the  viewpoint  of  speed  and  energy  consumption  [24].  The  results  interested  in 
this  research  is  the  comparison  of  off-chip  electrical  interconnect  and  free-space  optical 
interconnect.  The  polynomial  regressions  are  performed  as  functions  of  interconnect  length  on 
the  extracted  data  from  [24]. 

The  speed  regression  models  for  the  electrical  interconnect  and  the  optical  interconnect  are 
shown  in  Figure  10.  The  unit  of  interconnect  length  is  cm  and  the  unit  of  speed  is  MHz. 


■  Electrical 
A  Optic 

3000- 


-5  0  5  10  15  20  25 

Interconnection  Length  (cm) 

Figure  10.  Speed  regression  models  for  the  electrical  interconnect  and  the  optical 

interconnect 


Equation  (1)  and  (2)  represent  the  speed  of  electrical  interconnect  and  the  speed  of  optical 
interconnect  respectively,  where  x  is  interconnect  length  (cm). 

Electrical  =  ax 4  -  bx3,  +  cx2  -dx  +  e  (1) 

Optical  =  ax 4  -  bx  +  cx 2  -dx  +  e  (2) 
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The  coefficients  of  the  speed  regression  models  are  shown  in  Table  1. 


From  the  speed  point  of  view,  the  optical  interconnect  is  always  dominant  over  the  electrical 
interconnect. 


3.1.2. 6.  Energy  Consumption  Regression  Model 


Figure  1 1  shows  energy  consumption  regression  models  for  the  electrical  interconnect  and 
the  optical  interconnect.  From  the  graph,  the  breakpoint  which  optical  interconnect  is  dominant 
over  electrical  interconnect  in  terms  of  energy  consumption  is  obtained.  It  is  about  2.7cm. 
Therefore,  the  tool  could  replaces  electrical  interconnects  with  optical  interconnects  when  the 
interconnect  length  exceeds  2.7cm  without  exceeding  the  optical  routing  capacity. 


— ■ —  Electrical 
— • —  Optic 


Interconnection  Length  (cm) 


Figure  11.  Energy  consumption  regression  models  for  the  electrical  interconnect  and  the 

optical  interconnect 
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The  unit  of  interconnect  length  is  cm  and  the  unit  of  energy  consumption  is  pJ.  The 
polynomial  regression  models  for  the  electrical  interconnect  and  the  optical  interconnect  are 
written  by, 

Electrical  =  -ax4  +  hx  -  cx2  +  dx-e  (3) 

Optical  =  a  x  (1 0"1  x)4  -  bx'  +  cx2  +  dx  +  e  (4) 


where  x  is  interconnect  length  (cm). 

Table  2  shows  the  coefficients  of  the  regression  models  for  energy  consumption. 


At  the  beginning  and  the  end  of  optimization,  the  speed  and  the  energy  consumption  are 
calculated  with  above  analytical  regression  models  to  analyze  the  overall  SoC  performance. 

3. 1.2.7.  Pseudo  Code  for  Genetic  Optimizer 

The  pseudo  code  for  a  genetic  algorithm  [25]  with  the  objective  of  minimizing  the  total 
routing  cost  in  a  SoC  is  as  follows: 
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Genetic_Algorithm_Objective( ) 

1.  Generate  modules  and  netlists; 

2.  Generate  population;  //  Section  3. 1 .2. 1 

3.  Set  the  generation  number; 

4.  for  (each  generation)  { 

5.  Partially  matched  crossover;  //  Section  3. 1 .2. 1  -  Swap  modules 

6.  Module  compaction;  //  Section  3.1 .2.3  -  Find  the  optimal  module  placement 

7.  Mutation;  //  Section  3. 1 .2. 1  -  Rotate  modules 

8.  Module  compaction;  //  Section  3.1 .2.3  -  Find  the  optimal  module  placement 

9.  Evolution;  //  Section  3. 1.2.4  -  Optimize  the  total  routing  cost 

10.  } 

11.  Calculate  speed  improvement;  //  Section  3. 1 .2.5 

12.  Calculate  energy  saving;  //  Section  3.1 .2.6 

13.  Save  result  Files; _ 

Figure  12.  Pseudo  code  for  genetic  optimizer 


During  evolution,  the  algorithm  makes  total  cost  minimal. 

3.1.3.  Results  and  Analysis 

In  this  section,  we  present  experimental  results  of  the  optimization  achieved  by  using  the 
CAD  tool.  All  algorithms  are  implemented  in  C++. 

3. 1.3.1.  Simulation  Results 

As  mentioned  in  Section  3.1.1,  a  set  of  netlists  was  generated  randomly  for  a  specified 
number  of  modules.  For  comparison  of  overall  SoC  performance,  cases  with  9,  16,  25,  36  and  49 
modules  were  simulated  and  a  case  of  36  modules  is  shown  in  this  section.  For  the  all 
simulations,  the  dimension  of  SoCs  is  set  to  be  10xl0cm“. 

Figure  13  shows  the  graph  for  the  cost  reduction  versus  the  number  of  generations  of  the 
genetic  optimizer  with  36  modules  and  1000  netlists.  In  this  simulation,  the  optical  routing 
capacity  was  300. 
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Figure  13.  The  cost  graph 


The  result  shows  about  20%  saving  in  the  electrical  cost  which  is  given  in  Figure  9.  About 
99.7%  of  optical  routing  capacity  is  occupied. 

Table  3  shows  reductions  in  interconnect  length  for  the  different  sensor  distributions  of 
Figure  3  with  36  modules  and  1000  netlists.  For  the  simulations,  optical  routing  capacity  was 
500. 


Table  3.  Comparison  with  different  sensor  distributions 


Arrangement 

%  of  wires  converted 

%  reduction  in 

%  reduction 

No.  of  optical 

to  optical  links 

longest  wire  length 

of  total  cost 

directions 

(a) 

50 

60 

69 

495 

(b) 

48 

52 

65 

475 

(c) 

43 

47 

62 

428 

Figure  14,  Figure  15  and  Figure  16  show  an  example  of  the  optimization  performed  by  the 
CAD  tool.  The  blue  lines  represent  electrical  interconnects  and  the  red  lines  represent  optical 
interconnects.  Figure  14  shows  a  layout  and  description  of  all  the  interconnects  at  the  beginning 
of  the  optimization  process. 


16 


Figure  14.  Routing  without  optimization 

Figure  15  shows  only  the  optical  interconnects  in  the  left  side  and  only  the  electrical 
interconnects  in  the  right  side  at  the  end  of  the  optimization. 
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Another  result  for  ARM  core  is  shown  in  Figure  16.  The  first  figure  shows  the  routing  result 
before  optimization  and  the  second  figure  shows  only  the  optical  interconnects  and  the  third 
figure  shoes  only  the  electrical  interconnects. 


3.1.3. 2.  Speed  and  Energy  Issues 

In  this  section,  we  evaluate  the  SoC  speed  improvement  and  energy  saving  with  9,  16,  25,  36 
and  49  modules.  The  graph  for  the  percentage  improvement  of  overall  SoC  speed  versus  the 
number  of  optical  vectors  with  various  numbers  of  modules  is  shown  in  Figure  17. 


52  | . 

-50  0  50  100  150  200  250  300  350  400  450  500  550 

The  number  of  optical  vectors 


Figure  17.  The  graph  for  speed  improvement  versus  optical  directions  with  various 

numbers  of  modules 
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The  results  are  very  encouraging  and  show  that  more  than  54%  improvement  in  chip  speed 
can  be  obtained  through  the  use  of  optical  interconnects. 

Figure  18  shows  the  graph  for  the  percentage  saving  of  energy  consumption  versus  the 
number  of  optical  vectors  with  9,  16,  25,  36  and  49  modules. 


Figure  18.  The  graph  for  energy  saving  versus  optical  directions  with  various  numbers  of 

modules 


In  cases  of  9  and  16  modules,  the  best  results  are  with  235  and  200  optical  interconnects 
respectively. 

Table  4  shows  results  for  optimization  performed  with  9,  16,  25,  36  and  49  modules  against 
different  number  of  optical  directions  supported  by  optical  substrate.  The  table  entries  show  the 
number  of  optimized  optical  vectors  which  must  be  less  than  the  number  of  optical  directions, 
the  percentage  saving  in  total  energy  consumption  and  the  percentage  improvement  in  overall 
SoC  speed. 


19 


Table  4.  The  percentage  improvement  of  energy  consumption  and  speed  with  the  different 

number  of  optical  directions 


^^^^Optical  direction 
No.  of  module^^^^ 

10 

50 

100 

200 

300 

400 

500 

9 

Optical  vector 

10 

50 

96 

180 

235 

260 

268 

Energy  (%) 

22 

19 

23 

25 

26 

20 

15 

Speed (%) 

58 

61 

63 

65 

67 

67 

67 

16 

Optical  vector 

9 

50 

100 

200 

286 

375 

436 

Energy  (%) 

15 

18 

27 

34 

33 

31 

26 

Speed (%) 

57 

60 

62 

65 

67 

69 

70 

25 

Optical  vector 

9 

49 

100 

199 

300 

396 

489 

Energy  (%) 

11 

21 

24 

31 

36 

37 

38 

Speed (%) 

55 

59 

60 

64 

67 

68 

70 

36 

Optical  vector 

9 

49 

99 

200 

299 

399 

495 

Energy  (%) 

15 

21 

27 

32 

37 

42 

45 

Speed (%) 

54 

57 

60 

64 

66 

69 

71 

49 

Optical  vector 

10 

50 

99 

199 

300 

399 

497 

Energy  (%) 

18 

23 

30 

34 

36 

40 

44 

Speed (%) 

54 

61 

63 

64 

66 

70 

73 

3.1.4.  Summary 

In  this  research,  a  new  approach  to  high  performance  SoC  utilizing  free-space  optical 
interconnect  was  described.  The  results  show  that  more  than  55%  improvement  in  overall  SoC 
speed  and  more  than  16%  saving  in  total  energy  consumption  are  obtained  on  the  average 
through  the  optimization  process  with  the  use  of  free-space  optical  interconnects.  This  translates 
to  improve  the  overall  SoC  performance  by  a  factor  of  more  than  1.5. 
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3.2.  BOSS  (Board-level  Optical  clock  Synthesis  and  Simulation  tool) 

Due  to  increasing  levels  of  integration  and  sophistication  in  packaging  technologies,  the 
problem  of  routing  electrical  control  and  synchronization  signals  to  the  various  subsystems  of  the 
package  has  assumed  great  significance.  These  control  and  synchronization  signal  networks, 
such  as  the  clock  distribution  networks  discussed  in  this  paper,  have  to  be  designed  very 
carefully  in  order  to  maximize  the  performance  of  the  assembled  electronic  package  while 
minimizing  manufacturing  costs.  Specifically,  for  clock  distribution,  the  skew  of  the  clock  signal 
from  the  source  to  the  various  destination  points  must  be  minimized  very  carefully  in  order  to 
maximize  the  electrical  performance  of  the  package.  In  addition,  overall  power  consumption 
must  be  minimized.  These  stringent  design  requirements  necessitate  the  development  of  new 
CAD  tools  for  emerging  technologies  such  as  optical  interconnect. 

In  this  research,  we  develop  “BOSS”  (Board-level  Optical  clock  Synthesis  and  Simulator 
tool),  a  CAD  tool  that  finds  an  optimal  clock  routing  network  and  a  best  optical  data  input 
location  for  the  network  utilizing  optical  waveguide  technology.  Figure  19  shows  the  integration 
configuration  of  an  optical  clock  routing  on  system-on-a-package  (SOP)  substrate. 


Figure  19.  Integration  configuration  for  high-speed  optical  clock  distribution  using 

embedded  optoelectronics 

The  physical  design  of  the  optical  clock  distribution  network  is  composed  of  three  steps: 
partitioning,  rough  routing  and  calibration  for  clock  skew.  The  three  steps  are  described  in  detail 
in  Section  3.2.2. 1,  3.2. 2.2  and  3.2. 2. 7. 
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3.2.1.  Assumption 

It  is  assumed  that  different  combinations  of  L-shaped  waveguides  (90°  bent  L-shape)  are  used 
to  construct  the  optimal  optical  routing  network.  This  provides  flexibility  of  design  as  opposed  to 
the  use  of  a  rigid  H-tree  structure.  The  optical  waveguide  is  assumed  to  be  Transverse  Electric 
(TE)  field  polarized  with  single  mode  operation.  It  is  also  assumed  that  there  is  an  isolation  layer 
between  electrical  and  optical  substrate  to  avoid  signal-absorption.  The  final  layout  is  a  l-to-2x 
fanout  structure  where  x>l.  This  means  that  the  system  is  a  single  clock  system  and  the  number 
of  detectors  on  board  is  2X  where  x>l. 

From  the  latency  point  of  view,  optical  interconnection  has  three  types  of  latencies: 
transmitter  latency,  the  time  of  flight  and  receiver  latency.  In  this  paper,  the  latencies  in  the 
optical  transmitter  and  the  optical  receiver  are  not  considered.  However,  it  is  reported  that  they 
are  less  than  lOOpsec  respectively  in  recent  publication  [30]. 

3.2.2.  Optimization  Algorithms 

The  optimization  goals  are  as  follows: 

•  Given :  The  locations  of  all  the  terminal  points  (detectors)  to  which  the  clock  signal  is  to 
be  routed  optically. 

•  Determine :  (a)  The  location  of  the  clock  signal  transmitter  on  the  printed  wiring  board 
and  (b)  the  optimal  layout  of  the  optical  waveguides  from  the  transmitter  to  each  of  the 
detectors. 

•  Such  that  (optimization  criteria):  (a)  clock  signal  skew  is  minimized  and  (b)  bending 
and  propagation  losses  due  to  the  optical  waveguides  are  minimized. 

The  optimization  algorithm  consists  of  a  layout  partitioning,  a  waveguide  routing  and  a  local 
routing  heuristic  step.  These  are  described  next. 

3.2.2. 1.  Layout  Partitioning 

The  X-Y  partition  algorithm  is  used  [26].  The  board  B  is  partitioned  into  two  subregions,  Bl 
and  Br  with  equal  number  of  detectors.  The  subregions  BL  and  BR  are  then  partitioned  in  the 
orthogonal  direction.  Alternating  x-  and  y-direction  partitioning  is  recursively  performed  until 
there  are  two  detectors  in  each  subregion.  The  algorithm  is  illustrated  in  Figure  20. 
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3.2.2.2.  Waveguide  Routing 


For  optical  clock  distribution,  an  algorithm  which  we  call  the  Method  of  Optical  Centroid 
Searching  (MOCS)  is  used.  The  basic  idea  of  the  MOCS  algorithm  is  to  minimize  tie  path  length 
difference  from  the  transmitter  to  any  of  the  detectors  by  finding  “optical  centroid”  based  upon 
Manhattan  Geometry.  These  optical  centroids  represent  points  in  the  layout  grid  that  are 
equidistant  from  all  other  points  in  the  same  layout  partition  at  each  step  of  the  recursive  layout 
partitioning  process.  Hence,  there  are  as  many  optical  centroids  as  there  are  recursive  calls  in  the 
layout  partitioning  algorithm.  In  order  to  feed  all  the  detectors  corresponding  to  a  layout 
partition,  the  signal  feeding  the  detectors  is  fanned  out  to  the  detectors  or  other  optical  centroids 
at  the  optical  centroid  corresponding  to  the  partition.  The  MOCS  algorithm  is  illustrated  in 
Figure  2 1 . 
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Let  C  be  a  set  of  4  optical  centroids.  It  is  assumed  that  4  detectors  are  initially  set  as  the  first 
optical  centroids. 


C(x)  =  {cx(i)  |  c(i)  is  centroid  sorted  in  x-direction}  (5) 

C(y)  =  { Cy(i)  |  cy(i)  is  centroid  sorted  in  y-direction}  (6) 

where  i  =  (1,2, 3,4). 

The  next  optical  centroids  of  C  are  located  in  region  Rq 

Rc  =  {x,  y  |  (cx(2)  <  x  <  cx(3))D(cy(2)  <y<  c/3))}  (7) 

Let  Cl,  Cr,  Ct  and  Cb  be  initial  optical  centroids  and  Cm  be  the  next  optical  centroid  which  is 
a  yellow  dot  in 

Figure  2 1 .  Let  PLl,  PLr,  PLt  and  PLb  be  path  lengths  from  the  left,  right,  top  and  bottom 
optical  centroid  to  the  detectors  on  the  left,  right,  top  and  bottom  side. 

The  pseudo  code  for  recursive  MOCS  is  given  in  Figure  22.  This  MOCS  are  recursively 
introduced  until  there  is  only  one  optical  centroid  after  searching. 


Find_optical_centroid( ) 

1 .  for  (all  the  segments  by  X-Y  partitioning  )  { 

2.  decide  waveguide  proceeding  direction; 

3.  if  (waveguide  proceeding  direction  is  up  or  down)  { 

4.  abs  height  =  abs(C/.(y)  -CR( y)); 

5.  difference  =  abs(PL/-PLR); 

6.  PL!  =  PLl-  PL2  =  PLr,  Ci  =  CL{ y);  C2  =  C*(y); 

7-  } 

8.  else  { 

9.  abs_height  =  abs(C/(x)-C«(x)); 

10.  difference  =  abs(PLi-PLB); 

11.  PLi  =  PLt;  PL2  =  PLb;  Ci  =  Ct<x);  C2  =  Cs(x); 

12.  } 

13.  if  {PL i  >  PL 2)  { 

14.  if  ((down  and  C/  <  C2)  or  (up  and  C;  >  C2))  { 

15.  di=C/-abs  height;  do  =  CL  -difference; 

16.  } 

17.  else  if  ((down  and  C;  >=  C2)  or  (up  and  C;  <=  C2))  { 

18.  di  =  Ci,  d2  =  C2  -difference+  abs  height; 
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19.  } 

20.  else  if  ((left  and  67  <  67)  or  (right  and  67  >  67))  { 

21.  di  =  67;  62  =  C2  -difference-  abs_height; 

22.  } 

23.  else  { 

24.  di  =  63+  abs_height;  d?  =  67  -difference; 

25.  } 

26.  else  { 

27.  if  ((down  and  67  <  67)  or  (up  and  63  >  63))  { 

28.  di  =  67+difference-  abs_height;  d2  =  67; 

29.  } 

30.  else  if  ((down  and  67  >=  C2)  or  (up  and  67  <=  63))  { 

31.  di  =  67+difference;  d2  =  C2+  abs_height; 

32.  } 

33.  else  if  ((left  and  67  <  67)  or  (right  and  67  >  67))  { 

34.  di  =  67+  abs_height;  d2  =  C2  -difference; 

35.  } 

36.  else  { 

37.  di  =  67;  d2  =  63-  difference  -abs_height; 

38.  } 

39.  } 

40.  leng_diff  =  (d2  -  di)/2; 

41.  if  (waveguide  proceeding  direction  is  up  or  down)  { 

42.  67/(x)  =  xl  +  leng_diff; 

43.  if  (67  <  C2)  CM(y)  =  67; 

44.  else  Cdy)  =  Cr, 

45.  } 

46.  else  { 

47.  67/(y)  =  yl  +  leng_diff; 

48.  if  (67  <  67)  Cm(x)  =  67; 

49.  else  Cm(x)  =  Cr, 

50.  } 

5 1 .  Find_optical_centroid(  ); 

52.  } 

53.  Find  the  best  location  of  an  optical  data  input  (Section  3 .2.2.3); 

Figure  22.  Pseudo  code  for  MOCS  algorithm 


For  routing,  any  optical  waveguide  cannot  cross  any  other  optical  waveguide  or  any  detector 
to  avoid  inducing  significant  power  loss  caused  by  a  discontinuity  at  the  intersection. 
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3.2.2.3.  Input  Location 

As  mentioned  earlier,  the  system  designed  in  this  paper  is  a  single  clock  system.  BOSS 
provides  layouts  of  l-to-2x  fanout  (x>l).  The  optical  data  input  location  is  detennined  at  the  end 
of  routing  through  the  MOCS  (Method  of  Optical  Centroid  Searching)  algorithm.  The  x 
coordinate  is  same  as  the  last  optical  centroid  and  the  y  coordinate  is  the  bottom  of  the  board. 


3.2.2.4.  Bending  Radius 


Figure  23  shows  a  fundamental  guided  mode  wavefront  is  traveling  in  an  optical  waveguide. 


When  the  mode  passes  through  the  bent  optical  waveguide,  tangential  velocity  of  the  mode  in 
a  cladding  layer,  vtan=R(dd/dt)  will  exceed  the  velocity  of  light.  Thus,  the  portion  of  the 
evanescent  field  tail  xR  cannot  stay  in  phase  and  splits  away  from  the  guided  mode  and  radiates 
into  a  cladding  layer. 

The  rate  of  total  power  loss  along  z  can  be  described  by, 

dpXz) 


dz 


=  a»,p„Xz) 


(8) 


where  a,„  is  the  proportionality  constant. 


From  Equation  (8),  the  guided  power  for  mth  mode  can  be  written  by 


pm(z)  =  Po,me 


-2amz 


(9) 
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where  Po,m  is  the  incident  power  for  mth  mode  and  z=nR/2. 

The  am  can  be  easily  derived  by  calculating  radiating  aperture  [27]. 

(10) 

where  C;  and  C?  are  constants. 

Therefore,  Equation  (9)  is  a  function  of  bending  radius  of  optical  waveguide. 

In  order  to  minimize  clock  skew,  BOSS  finds  the  maximum  bending  radius  with  the 
assumptions  that  are  made  in  Section  3.2.1.  It  means  that  path  lengths  are  minimized  and  clock 
skews  are  also  minimized  with  a  given  partitions  and  routing. 

3.2.2. 5.  Loss  Calculation 

Two  major  loss  terms  in  an  index-guided  optical  interconnect  are  considered  to  evaluate 
layouts  from  BOSS  simulation.  The  bending  loss  of  optical  waveguide  is  derived  in  the  section 
3. 2. 2.4.  In  order  to  specify  to  BOSS,  the  result  of  bending  loss  with  specific  parameters  is  curve- 
fitted  with  Boltzmann  function.  The  parameters  which  are  used  in  this  paper  are  taken  from  a 
real  fabrication  condition  [28].  They  are  1pm  Benzocyclobutene  (BCB)  as  a  core  layer,  Si02  as  a 
cladding  layer,  1pm  waveguide  thickness  and  a  wavelength  of  1.3pm.  The  analytical  regression 
model  for  the  bending  loss  of  optical  waveguide  is  shown  in  Equation  (11). 

BL  =  [1 .0508  /(I  +  e<radius-4xl<r5  >/10  5 )]  +  9.9  x  10-4  (11) 

Equation  (11)  tells  that  the  result  is  saturated  over  240pm  bending  radius  of  optical 
waveguide.  It  translates  that  the  bending  loss  is  negligible  over  240pm  bending  radius.  For  the 
simulations,  the  minimum  bending  radius  of  optical  waveguide  is  assumed  to  be  100pm. 

The  propagation  loss  caused  by  a  scattering  loss,  material  absorption  and  structural 
imperfection  of  optical  waveguide  can  be  estimated  by  using  a  fiber  scanning  method  [28].  The 
measured  propagation  loss  is  0.36dB/cm  at  a  wavelength  of  1.3pm. 

In  this  paper,  the  splitting  loss  of  optical  waveguide  is  assumed  to  be  negligible. 
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3.2.2.6.  Delay  Calculation 

The  effective  refractive  index  of  TEo  mode  in  the  dielectric  waveguide  is  about  1.4927.  c  is 
the  speed  of  light  in  free-space.  neff  is  effective  refractive  index,  v  is  the  phase  velocity  of  the 
guided  mode  in  the  material. 

v  =  -^—  □  2x10s  (m/s)  (12) 

Heff 

Thus,  delay  d  is 


d  =  -  =- 


-(sec) 


v  2x10 

where  x  =  (a  path  length  -  the  shortest  path  length). 

However,  the  delay  deiectricai  in  electrical  interconnection  is 

x  x 


(13) 


d 


c/5  0.6x10' 


-(sec) 


(14) 


because  the  signal  propagation  speed  for  repeatered  global  electrical  interconnections  can  be 
assumed  to  be  approximately  c/5  [29]. 

In  order  to  minimize  clock  skew,  the  CAD  algorithm  finds  the  maximum  bending  radius  with 
the  assumptions  that  are  made  in  Section  3.2.1.  It  means  that  path  lengths  are  minimized  and 
clock  skews  are  also  minimized  with  a  given  partitions  and  routing. 


32.2.1.  Local  Routing  Heuristic 

For  each  optical  centroid,  the  maximum  bending  radius  is  different.  Examples  of  the  local 
routing  stage  are  shown  in  Figure  24. 
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Different  bending  radius  causes  a  length  difference  between  optical  data  input  and  detector. 
This  results  in  signal  timing  skew.  Thereby,  bending  radii  of  all  waveguides  corresponding  to  the 
nth  stage  of  recursion  are  all  identical  to  avoid  signal  timing  skew.  The  nth  stage  of  recursion  in 
Figure  22  corresponds  to  a  local  layout  of  the  optical  waveguides  from  a  centroid  (local  centroid) 
to  other  centroids  (sub-centroids)  or  a  set  of  detectors.  For  routing  from  a  local  centroid  to  a  sub¬ 
centroid,  the  ideal  choice  is  to  pick  a  waveguide  layout  with  the  largest  bending  radius.  This 
minimizes  bending  losses.  However,  use  of  interconnect  with  different  bending  radius  (see 
Figure  24)  can  cause  timing  skews  between  the  various  signal  paths.  Thus,  all  waveguides  are 
routed  using  the  smallest  bending  radius  (of  the  largest  for  each  interconnect)  over  all  the 
interconnects  from  the  local  centroid  to  all  the  sub-centroids.  This  is  called  the  local  routing 
heuristic. 

The  pseudo  code  for  local  routing  heuristic  is  as  follows: 
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Local_routing_heuristic( ) 

1.  X-Y  partitioning  (Section  3.2.2. 1); 

2.  Find_optical_centroid(  )  (Figure  21); 

3.  for  (each  partition)  { 

4.  Find  the  optimal  bending  radius  for  the  optical  centroids  at  each  partition; 

5.  bending  radius  =  the  smallest  bending  radius  among  the  optimal  bending 
radii  of  each  stage; 

6.  } 

7.  Calculate  total  loss  (Section  3. 2.2. 5); 

8.  Calculate  delay  (Section  3.2.2.6); 

Figure  25.  Pseudo  code  for  local  routing  heuristic 


3.2.3.  Results  and  Analysis 


3.2.3. 1.  Preliminary  Result 


The  preliminary  result  for  a  4-fanout  structure  is  shown  in  Figure  26. 


Radius  =  1mm 


r 


Figure  26.  l-to-4  H-tree  structure  layout  with  two  enlarged  microphotographs  of 

fabrication 


This  design  was  designed  using  BOSS  and  fabricated.  Two  microphotographs  corresponding 
to  the  fabricated  Inverted-Metal  Semiconductor  Metal  (I-MSM)  photodetectors  embedded  in  the 
BCB  (Benzocyclobutene)  polymer  waveguide  on  the  SKVSi  substrate  are  also  shown  in  Figure 


30 
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26.  The  dimension  of  the  board  is  5x5cm  .  The  bending  radii  of  the  optical  waveguides  are 
respectively  1  mm. 

3.2.3.2.  Symmetric  Structure 


An  H-tree  system  with  fanout  of  256  has  been  designed.  Figure  27  shows  the  unoptimized 
layout. 


2 

The  dimension  of  the  board  is  10><10cm  and  the  bending  radius  is  1.5mm.  The  length 
between  detectors  is  6mm  in  the  x-  and  y-directions. 

The  optimized  layout  of  Figure  27  is  shown  in  Figure  28  through  the  use  of  waveguides  with 
different  bending  radii. 
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This  was  carefully  designed  to  avoid  waveguide-crossing.  The  bending  radii  of  each  stage 
from  a  detector  to  an  optical  data  input  are  1.5,  1.5,  3,  3,  6,  6,  7  and  7mm.  The  total  loss  of  the 
longest  path  from  an  optical  data  input  to  a  detector  due  to  changing  bending  radius  is  calculated 
and  shown  in  Figure  29. 


The  total  loss  includes  the  bending  loss  of  optical  waveguide,  the  propagation  loss  and  the 
splitting  loss  described  in  Section  3. 2.2. 5.  The  maximum  power  saving  after  optimization  is 
about  15%. 
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3.2.3.3.  Asymmetric  Structure 

With  the  new  L-shaped  optical  waveguide,  BOSS  can  design  asymmetric  structures  while 
finding  the  best  location  of  the  optical  data  input. 

Figure  30  shows  an  asymmetric  clock  routing  of  fanout  64  utilizing  the  proposed  MOCS 
algorithm  (see  Section  3. 2. 2. 2)  with  L-shape  optical  waveguide. 


Input 


Figure  30.  Asymmetric  clock  routing  of  fanout  64  without  optimization 


2 

The  dimension  of  the  board  is  5x5cm  .  Each  bending  radius  is  200pm. 


Through  the  use  of  different  bending  radii,  the  layout  of  Figure  30  is  optimized  and  results  in 
the  layout  of  Figure  3 1 . 
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Figure  31.  Asymmetric  clock  routing  of  fanout  64  with  optimization 


The  bending  radii  of  each  stage  from  a  detector  to  an  optical  data  input  are  0.2,  0.2,  1.5,  1.8, 
4.2,  and  5.0mm. 

An  asymmetric  clock  routing  of  fanout  256  is  designed  in  Figure  32.  The  layout  is  very 
similar  to  the  H-tree  structure. 
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Figure  33  is  an  optimized  version  of  Figure  32  with  different  bending  radii  -  note  that  no 
additional  timing  skew  is  introduced  in  Figure  33  as  opposed  to  Figure  32. 


The  bending  radii  of  each  stage  from  a  detector  to  an  optical  data  input  are  0.4,  0.4,  1.5,  1.5, 
4.8,  5.4,  10.8  and  1 1.6mm. 

Table  5  shows  the  total  power  loss  along  the  longest  path  for  each  bending  stage.  As 
described  earlier,  the  bending  loss  exceeded  240pm  bending  is  negligible.  That  is  the  reason  the 
results  don’t  seem  to  be  affected  by  the  bending  loss. 


Table  5.  Power  loss  along  the  longest  path  for  each  stage  (dB) 


Stage 

Structure 

1 

2 

3 

4 

5 

6 

7 

8 

Symmetric 

16 

3.82 

7.12 

10.3 

13.9 

- 

- 

- 

- 

64 

3.84 

7.12 

10.3 

13.4 

16.5 

20.1 

- 

- 

256 

4.58 

8.34 

11.7 

15.0 

18.2 

21.4 

24.4 

28.9 

Asymmetric 

64 

3.99 

7.90 

11.2 

14.3 

17.4 

21.0 

- 

- 

256 

4.61 

8.39 

11.9 

15.2 

18.5 

21.6 

24.6 

27.8 
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Table  6  shows  minimum,  maximum,  average  signal  timing  skew  and  the  maximum  time  of 
flight  along  the  longest  path  for  different  structures. 


Table  6.  Signal  timing  skew  along  the  longest  path 


Skew 

Minimum 

Maximum 

Average 

Time  of  flight 

Structure 

(psec) 

(psec) 

(psec) 

(psec) 

16 

0 

0 

0 

163.91 

Symmetric 

64 

0 

0 

0 

240.16 

256 

0 

0 

0 

599.89 

Asymmetric 

64 

7.5 

7.5 

1.28 

255.82 

256 

23.9 

26.1 

3.48 

614.38 

The  maximum  signal  timing  skew  is  about  26.1psec  when  the  time  of  flight  is  614.38psec. 
This  result  implicates  that  signal  timing  skew  in  the  simulation  structures  is  negligible  (<  4%). 
For  the  same  structure  with  electrical  interconnections  based  on  Equation  (14),  the  signal  timing 
skew  is  about  87.43psec. 

3.2.4.  Summary 

A  new  approach  to  optimized  clock  routing  using  optical  waveguide  is  presented.  The  results 
are  very  encouraging  and  show  that  less  than  26.1psec  in  signal  timing  skew  is  obtained  for  a 
signal  flight  time  of  614.38psec.  About  15%  reduction  in  power  consumption  is  also  obtained 
over  clock  nets  routed  with  existing  (optical)  methods. 

4.  Conclusion 

We  have  presented  two  new  approaches  to  design  and  optimization  of  optoelectronic  systems 
using  optical  interconnections.  The  first  approach  is  for  data  path  routing  on  SoC  utilizing  firee- 
space  optical  interconnect  technology.  The  second  approach  is  for  clock  routing  between 
modules  using  optical  waveguide  interconnect.  By  taking  into  account  various  parameters,  we 
have  modeled  optical  interconnections.  Using  the  models,  data  path  routing  and  clock  routing  are 
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optimized  by  our  new  optimization  algorithms.  Experimental  results  show  that  our  approaches 

can  improve  the  overall  performance  of  optoelectronic  systems  compared  to  conventional 

electrical  systems  in  tenns  of  system  speed  and  energy  consumption. 

5.  Published  Papers 

[1]  C.  Seo  and  A.  Chatterjee,  “A  CAD  Tool  for  SoC  Placement  and  Routing  with  Free-Space 
Optical  Interconnect ,”  ICCD,  pp.  24-29,  September  2002. 

[2]  C.  Seo,  A.  Chatterjee  and  T.  J.  Drabik,  “ Wiring  Optimization  for  Propagation  Delay  and 
Power  Consumption  of  Multichip  Modules  with  Free-Space  Optical  Interconnect ECTC, 
May  2003. 

[3]  J.  Shin,  C.  Seo,  A.  Chellappa,  M.  Brooke,  A.  Chatterjee  and  N.  M.  Jokerst,  “ Comparison 
of  Electrical  Interconnect  and  Optical  Interconnect ,”  ECTC,  May  2003. 

[4]  C.  Seo  and  A.  Chatterjee,  “ Free-Space  Optical  Interconnect  for  High-Performance  MCM 
Systems ”,  IWSOC,  June  2003. 

References 

[1]  “International  technology  roadmap  for  semiconductors  2002  update,”  Semiconductor 
Industry  Association,  2002. 

[2]  D.  A.  Miller,  “Rationale  and  challenges  for  optical  interconnects  to  electronic  chips,”  Proc. 
IEEE,  vol.  88,  no.  6,  pp.  728-749,  Jun.  2000. 

[3]  R.  W.  Keyes,  “Power  dissipation  in  information  processing,”  Science,  vol.  168,  pp.  796-801, 
1970. 

[4]  H.  M.  Gibbs,  Optical  Bistability:  Controlling  Light  with  Light.  Orlando,  FL:  Academic, 
1985. 

[5]  J.  W.  Goodman,  F.  J.  Leonberger,  S.  Y.  Kung,  and  R.  A.  Athale,  “Optical  interconnections 
for  VLSI  systems,”  Proc.  IEEE,  col.  72,  no.  7,  pp.  850-866,  Jul.  1984. 

[6]  D.  A.  Miller,  et  al,  “Bandedge  electro-absorption  in  quantum  well  structures:  The  quantum 
confined  stark  effect,”  Phys.  Rev.  Letter,  vol.  53,  pp.  2173-2177,  1984. 


37 


[7]  H.  Soda,  et  al,  “GalnAsP-InP  surface  emitting  injection-lasers,”  Jpn.  J.  Appl.  Phys.,  vol.  18, 
pp.  2329-2330,  1979. 

[8]  J.  L.  Jewell,  et  al,  “Low-threshold  electrically  pumped  vertical-cavity  surface-emitting 
microlasers,”  Electron.  Letter,  vol.  25,  pp.  1 123-1124,  1989. 

[9]  T.  J.  Drabik,  “Optoelectronic  integrated  systems  based  on  free-space  interconnects  with  an 
arbitrary  degree  of  space-variance,”  Proc.  IEEE,  vol.  82,  no.  11,  pp.  1592-1622,  Nov.  1994. 

[10]  G.  I.  Yayla,  et  al,  “A  Prototype  3D  Optically  Interconnected  Neural  Network,”  Proc.  Of 
IEEE,  vol.  82,  no.  1 1,  Nov.  1994. 

[11]  S.  P.  Levitan,  et  al,  “Computer-Aided  Design  of  Free-Space  Opto-Electronic  Systems,” 
DAC,  pp.  768-773,  1997. 

[12]  X.  Zheng,  et  al,  “High  Speed  Parallel  Multi-chip  Interconnection  With  Free  Space  Optics,” 
International  Conference  on  Parallel  Interconnects,  pp.  13-20,  1999. 

[13]  M.  Forbes,  J.  Gourlay  and  M.  Desmulliez,  “Optically  interconnected  electronic  chips:  a 
tutorial  and  review  of  the  technology,”  Electronics  &  Communication  Engineering  Journal, 
pp.  221-232,  Oct.  2001. 

[14]  N.  Savage,  “Linking  with  Light,”  IEEE  Spectrum,  pp.  32-36,  Aug.  2002. 

[15]  J.  Wilson  and  J.  F.  B.  Hawkes,  Optoelectronics  -  An  introduction,  Prentice  Hall,  1989. 

[16]  A  short  history  of  fiber  optics.  By  Jeff  Hecht, 
http://www.sff.net/people/Jeff.Hecht/history.html. 

[17]  B.  D.  Clymer  and  J.  W.  Goodman,  “Timing  uncertainty  for  receivers  in  optical  clock 
distribution  for  VLSI,”  Opt.  Eng.,  vol.  27,  pp.  944-954,  Nov.  1988. 

[18]  P.  J.  Delfyett,  D.  H.  Hartman  and  S.  Z.  Ahmad,  “Optical  clock  distribution  using  a  mode- 
locked  semiconductor  laser-diode  system,”  J.  Lightwave  Technology,  vol.  9,  pp.  1646-1649, 
Dec.  1991. 

[19]  J.  H.  Yeh,  R.  K.  Kostuk  and  K.  Y.  Tu,  “Board  level  H-tree  optical  clock  distribution  with 
substrate  mode  holograms,”  Jornal  of  Lightwave  Technology,  vol.  13,  pp.  1566-1578,  Jul. 
1995. 


38 


[20]  Y.  Li,  “Demonstration  of  fiber-based  board-level  optical  clock  distributions,”  International 
Conference  on  Massively  Parallel  Processing,  pp.  224-228,  1998. 

[21]  B.  Bihari,  et  a/,  “Optical  clock  distribution  in  super-computers  using  Polyimide-based 
waveguides,”  SPIE,  vol.  3632,  pp.  123-133,  Jan.  1999. 

[22]  Travelling  Salesman  Problem  Using  Genetic  Algorithms,  http://www.lalena.com/ai/tsp/. 

[23]  N.  A.  Sherwani,  Algorithms  for  VLSI  physical  design  automation.  Kluwer  Academic 
Publishers,  1999. 

[24]  G.  I.  Yayla,  P.  J.  Marchand,  and  S.  C.  Esener,  “Speed  and  energy  analysis  of  digital 
interconnections:  comparison  of  on-chip,  off-chip,  and  free-space  technologies,”  Applied 
Optics,  vol.  37,  no.  2,  pp. 205-227,  Jan.  1998. 

[25]  D.  E.  Goldberg,  “Genetic  algorithms  in  search,  optimization,  and  machine  learning,” 
Addison-Wesley  Publishing  Company,  1989. 

[26]  M.  A.  B.  Jackson,  A.  Srinivasan  and  E.  S.  Kuh,  “Clock  routing  for  high-performance  ICs,” 
DAC,  pp.  573-579,  Jun.  1990. 

[27]  D.  L.  Lee,  Electromagnetic  principles  of  integrated  optics.  New  York  Wiley,  1986. 

[28]  S.  Y.  Cho,  S.  W.  Seo,  M.  A.  Brooke  and  N.  M.  Jokerst,  “Integrated  Detectors  for  Embedded 
Optical  Interconnections  on  Electrical  Boards,  Modules,  and  Integrated  Circuits,”  IEEE 
Journal  of  Special  Topics  in  Quantum  Electronics,  2002. 

[29]  D.  A.  Miller,  “Dense  Optical  Interconnections  for  Silicon  Electronics”  in  Trends  in  Optics 
1995,  International  Commission  for  Optics/  Academic  Press,  vol.  3,  pp.  207-222,  1996. 

[30]  D.  Agarwal,  “Optical  Interconnects  to  Silicon  Chips  using  Short  Pulses”,  Ph.D.  Thesis, 
Stanford  University,  Sep.  2002. 

[3 1]  J.  Cho,  et  al,  “High  performance  MCM  routing,”  IEEE  DTC,  pp.  27-37,  Dec.  1993. 

[32]  H.  B.  Bakoglu,  Circuit,  Interconnects  and  Packaging  for  VLSI.  Addison  Wesley,  1990. 

[33]  M.  Goets,  “System  on  Chip  Design  Methodology  Applied  to  System  in  Package 
Architecture,”  ECTC,  pp. 254-258,  2002. 

[34]  L.  Benini  and  G.  De  Micheli,  “Networks  on  chips:  a  new  SOC  paradigm,”  IEEE  Computer, 
vol.  35,  pp.  70-78,  Jan.  2002. 


39 


[35]  J.  Becker,  M.  Glesner  and  T.  Pionteck,  “Adaptive  systems-on-chip:  architectures, 
technologies  and  applications,”  ICSD,  pp.  2-7,  2001. 

[36]  Matsuzawa,  “RF-SoC-expectations  and  required  conditions,”  IEEE  Tran,  on  Microwave 
Theory  and  Techniques,  vol.  50,  pp.  245-253,  Jan.  2002. 

[37]  H.  Lee,  et  al,  “A  New  Formulation  for  SOC  Floorplan  Area  Minimization  Problem,”  DATE, 

2002. 

[38]  J.  Jahns,  “Integrated  free-space  optical  interconnects  for  chip-to-chip  communications,” 
Massively  Parallel  Processing,  pp.  20-23,  Jun.  1998. 

[39]  C.  K.  Cheng,  et  al,  “Computer  aided  design  and  packaging  optoelectronic  systems  with  free 
space  optical  interconnects,”  Custom  Integrated  Circuits  Conference,  pp.  29.3.1-29.3.4,  May 
1993. 

[40]  J.  A.  Neff,  “VCSEL-based  smart  pixels  for  free-space  optical  interconnection,”  LEOS,  vol. 
2,  pp.  151-152,  Nov.  1997. 

[41]  J.  Davis,  V.  De  and  J.  Meindl,  “A  Stochastic  wire  Length  Distribution  for  Gigascale 
Integration(GSI)-part  1  :  Derivation  and  Validation,”  Proc.  IEEE  Trans.  On  Electron 
Devices,  vol.  45,  no  3,  pp.  580-589,  Mar.  1998. 

[42]  M.  R.  Feldman,  “Holographic  optical  interconnects  for  multichip  modules,”  in  Proc.  SPIE, 
vol.  1390,  pp.427-433,  1990. 

[43]  B.  B.  Bhattacharya,  S.  Das  and  S.  C.  Nandy,  “High  perfonnance  MCM  routing:  a  new 
approach,”  VLSI  Design,  pp.  564-569,  Jan.  1999. 

[44]  C.  Sechen,  “Chip-planning,  placement,  and  global  routing  of  macro/custom  cell  integrated 
circuits  using  simulated  annealing,”  DAC,  pp.  73-80,  Jun.  1988. 

[45]  V.  Schnecke  and  O.  Vornberger,  “Genetic  design  of  VLSI-layouts,”  GALESIA,  pp.  430-435, 
Sep.  1995. 

[46]  N.  G.  Bourbakis  and  M.  Mortazavi,  “An  efficient  building  block  layout  methodology  for 
compact  placement,”  VLSI,  pp.  118-123,  Mar.  1995. 


40 


[47]  C.  S.  Baldwin,  et  al,  “High  perfonnance  package  designs  for  a  1  GHz  microprocessor,” 
IEEE  Trans,  on  Advanced  Packaging,  vol.  24,  pp.  470-476,  Nov.  2001. 

[48]  J.  Lienig,  M.  N.  S.  Swamy  and  K.  Thulasiraman,  “Routing  algorithms  for  multi-chip 
modules,”  DAC,  pp.  286-291,  Sep.  1992. 

[49]  K.  Bois,  et  al,  “Optimizing  the  package  design  with  electrical  modeling  and  simulation,” 
ECTC,  pp.  111-117,  2001. 

[50]  C.  L.  Valenzuela  and  P.  Y.  Wang,  “VLSI  Placement  and  Area  Optimization  Using  a  Genetic 
Algorithm  to  Breed  Normalized  Postfix  Expressions,”  IEEE  Trans,  on  Evolutionary 
Computation,  vol.  6,  no.  4,  pp.  390-401,  Aug.  2002. 

[51]  J.  Cong,  A.  B.  Kahng  and  G.  Robins,  “Matching-Based  Methods  for  High-Perfonnance 
Clock  Routing,”  IEEE  TCAD,  vol.  12,  no.  8,  pp.  1157-1169,  Aug.  1993. 

[52]  S.  L.  Sam,  A.  Chandrakasan  and  D.  Boning,  “Variation  issues  in  on-chip  optical  clock 
distribution,”  IEEE  International  Workshop  on  Statistical  Methodology,  pp.  64-67,  2001. 

[53]  W.  H.  Ryu,  et  al,  “Over  GHz  low-power  RF  clock  distribution  for  a  multiprocessor  digital 
system,”  ECTC2001,  pp.  133-140,  2001. 

[54]  R.  T.  Chen,  et  al,  “Fully  embedded  board-level  guided-wave  optoelectronic  interconnects,” 
Proceedings  of  the  IEEE,  vol.  88,  pp.  780-793,  Jun.  2000. 

[55]  A.  V.  Mule,  et  al,  “An  optical  clock  distribution  network  for  gigascale  integration,” 
Interconnect  Technology  Conference  2000,  pp  6,  2000. 

[56]  K.  J.  Ebeling,  “VCSELs:  prospects  and  challenges  for  optical  interconnects,”  LEOS2000, 
vol.  1,  pp.  7-8,  2000. 

[57]  S.  Tanp,  et  al,  “1-GHz  clock  signal  distribution  for  multi-processor  super  computers,” 
International  Conference  on  Massively  Parallel  Processing  Using  Optical  Interconnections, 
pp.  286-191,  1996. 

[58]  H.  Zarschizky,  et  al,  “Optical  Clock  Distribution  With  A  Compact  Free  Space  Interconnect 
System,”  LEOS1992,  pp.  590-591,  1992. 

[59]  C.  Sebillotte  and  T.  Lemoine,  “Computer  generated  holograms  directly  etched  in  glass  and 
their  use  as  boards  interconnects  means,”  Holographic  Systems,  Components  and 
Applications,  pp.  52-56,  1991. 


41 


[60]  D.  Prongue  and  H.  P.  Herzig,  “Design  and  fabrication  of  HOE  for  clock  distribution  in 
integrated  circuits,”  Holographic  Systems,  Components  and  Applications,  pp.  204-208, 
1989. 

[61]  A.  Naeemi,  P.  Zarkesh-Ha,  C.  S.  Patel,  and  J.  D.  Meindl,  “Performance  improvement  using 
on-board  wires  for  on-chip  interconnects,”  EPEP2000.  pp  325-328,  Oct.  2000. 

[62]  N.  A.  Kurd,  J.  S.  Barkatullah,  R.  O.  Dizon,  T.  D.  Fletcher,  and  P.  D.  Madland,  “A  multi- 
GHz  clocking  scheme  for  Pentium  4  microprocessor,”  IEEE  J.  Solid  State  Circuits,  vol.  36, 
pp.  1647-1653,  Nov.  2001. 

[63]  R.  K.  Krishnamurthy,  K.  Soumyanath,  and  S-L.  Lu,  “High-performance  interconnect  design 
for  system-on-chip,”  Proc.  IEEE  International  ASIC/SOC  Conference,  pp.  424,  Sept.  1999. 

[64]  S.  K.  Tewksbury  and  L.  A.  Homak,  “Optical  clock  distribution  in  electronic  systems,”  J. 
VLSI  Signal  Processing,  vol.  16,  pp.  225-246,  June- July  1997. 

[65]  S.  J.  Walker  and  J.  Jahns,  “Optical  clock  distribution  using  integrated  free-space  optics,” 
Opt.  Communications,  vol.  90,  pp.  359-371,  June  1992. 

[66]  K.  M.  Carrig,  et  al,  “A  clock  methodology  for  high-performance  microprocessors,”  J.  VLSI 
Signal  Process.,  vol.  16,  pp.  217-224,  Jun.  1997. 

[67]  R.  R.  Tummala,  Fundamentals  of  Microsystems  Packaging.  McGraw-Hill,  2001. 


42 


